Improving the Energy and Execution Efficiency of a Small Instruction Cache by Using an Instruction Register File
نویسندگان
چکیده
Small filter caches (L0 caches) can be used to obtain significantly reduced energy consumption for embedded systems, but this benefit comes at the cost of increased execution time due to frequent L0 cache misses. The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application’s instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies. This paper explores the use of an L0 cache enhanced with an IRF to provide even further reduced energy consumption with improved execution time. The results indicate that the IRF is an effective means for offsetting execution time penalties due to pipeline frontend bottlenecks. We also show that by combining an IRF and an L0 cache, we are able to achieve reductions in fetch energy that is greater than using either feature in isolation.
منابع مشابه
The Efficacy of an SFL-Oriented Register Instruction in Improving Iranian EFL Learners’ Writing Performance and Perception: Language Proficiency in Focus
The current study sought to explore the impact of SFL-oriented register instruction on Iranian EFL learner’ writing performance with a central focus on their English proficiency level. As its secondary aim, the study delved deeply into the learners’ perception of the register-based instruction. To these ends, 50 intermediate and 50 advanced Iranian EFL learners were selected randomly and assign...
متن کاملGREENER: A Tool for Improving Energy Efficiency of Register Files
Graphics Processing Units (GPUs) maintain a large register file to increase thread block occupancy, hence to improve the thread level parallelism (TLP). However, register files in the GPU dissipate a significant portion of the total leakage power. Leakage power of the register file can be reduced by putting the registers into low power (SLEEP or OFF) state. However, one challenge in doing so is...
متن کاملThesis - Vasileios Porpodas
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitio...
متن کاملReducing Memory Traffic and Accelerting Prolog Execution in a Superscalar Prolog System
Memory access operations constitute about 32.7% of all the operations executed in a typical Prolog program. Among these memory accesses, 75% are to the program control structures (environments and choice points). These memory accesses plus possible data cache misses greatly impair system performance, and the problem is even more severe in a VLIW, superscalar, or superpipelined Prolog system. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007